Internal Clustering Evaluation of Data Streams

نویسندگان

  • Marwan Hassani
  • Thomas Seidl
چکیده

Clustering validation is a crucial part of choosing a clustering algorithm which performs best for an input data. Internal clustering validation is efficient and realistic, whereas external validation requires a ground truth which is not provided in most applications. In this paper, we analyze the properties and performances of eleven internal clustering measures. In particular, as the importance of streaming data grows, we apply these measures to carefully synthesized stream scenarios to reveal how they react to clusterings on evolving data streams. A series of experimental results show that different from the case with static data, the Calinski-Harabasz index performs the best in coping with common aspects and errors of stream clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Benchmarking Stream Clustering Algorithms within the MOA Framework

In today’s applications, massive, evolving data streams are ubiquitous. To gain useful information from this data, real time clustering analysis for streams is needed. A multitude of stream clustering algorithms were introduced. However, assessing the effectiveness of such an algorithm is challenging, because up to now there is no tool that allows a direct comparison of these algorithms. We pre...

متن کامل

Effective Evaluation Measures for Subspace Clustering of Data Streams

Nowadays, most streaming data sources are becoming highdimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, existing subspace clustering evaluation measures are mainly designed for static data, and cannot reflect the quality of the evolving nature of data streams. On the other ...

متن کامل

A New Mathematical Model for the Prediction of Internal Recirculation in Impinging Streams Reactors

A mathematical model for the prediction of internal recirculation of complex impinging stream reactors has been presented. The model constitutes a repetition of a series of ideal plug flow reactors and CSTR reactors with recirculation. The simplicity of the repeating motif allows for the derivation of an algebraic relation of the whole system using the Laplace transform. An impinging stream...

متن کامل

Adaptive Mining Techniques for Data Streams using Algorithm Output Granularity

Mining data streams is an emerging area of research given the potentially large number of business and scientific applications. A significant challenge in analyzing/mining data streams is the high data rate of the stream. In this paper, we propose a novel approach to cope with the high data rate of incoming data streams. We termed our approach “algorithm output granularity”. It is a resource-aw...

متن کامل

Divisive clustering of high dimensional data streams

Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015